225 research outputs found
ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification
Skin image datasets often suffer from imbalanced data distribution,
exacerbating the difficulty of computer-aided skin disease diagnosis. Some
recent works exploit supervised contrastive learning (SCL) for this long-tailed
challenge. Despite achieving significant performance, these SCL-based methods
focus more on head classes, yet ignoring the utilization of information in tail
classes. In this paper, we propose class-Enhancement Contrastive Learning
(ECL), which enriches the information of minority classes and treats different
classes equally. For information enhancement, we design a hybrid-proxy model to
generate class-dependent proxies and propose a cycle update strategy for
parameters optimization. A balanced-hybrid-proxy loss is designed to exploit
relations between samples and proxies with different classes treated equally.
Taking both "imbalanced data" and "imbalanced diagnosis difficulty" into
account, we further present a balanced-weighted cross-entropy loss following
curriculum learning schedule. Experimental results on the classification of
imbalanced skin lesion data have demonstrated the superiority and effectiveness
of our method
TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis
Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by
modern computer-aided diagnosis technology based on deep convolutions. However,
the information aggregation across modalities in MSLD remains challenging due
to severity unaligned spatial resolution (dermoscopic image and clinical image)
and heterogeneous data (dermoscopic image and patients' meta-data). Limited by
the intrinsic local attention, most recent MSLD pipelines using pure
convolutions struggle to capture representative features in shallow layers,
thus the fusion across different modalities is usually done at the end of the
pipelines, even at the last layer, leading to an insufficient information
aggregation. To tackle the issue, we introduce a pure transformer-based method,
which we refer to as ``Throughout Fusion Transformer (TFormer)", for sufficient
information intergration in MSLD. Different from the existing approaches with
convolutions, the proposed network leverages transformer as feature extraction
backbone, bringing more representative shallow features. We then carefully
design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks
to fuse information across different image modalities in a stage-by-stage way.
With the aggregated information of image modalities, a multi-modal transformer
post-fusion (MTP) block is designed to integrate features across image and
non-image data. Such a strategy that information of the image modalities is
firstly fused then the heterogeneous ones enables us to better divide and
conquer the two major challenges while ensuring inter-modality dynamics are
effectively modeled. Experiments conducted on the public Derm7pt dataset
validate the superiority of the proposed method. Our TFormer outperforms other
state-of-the-art methods. Ablation experiments also suggest the effectiveness
of our designs
OvarNet: Towards Open-vocabulary Object Attribute Recognition
In this paper, we consider the problem of simultaneously detecting objects
and inferring their visual attributes in an image, even for those with no
manual annotations provided at the training stage, resembling an
open-vocabulary scenario. To achieve this goal, we make the following
contributions: (i) we start with a naive two-stage approach for open-vocabulary
object detection and attribute classification, termed CLIP-Attr. The candidate
objects are first proposed with an offline RPN and later classified for
semantic category and attributes; (ii) we combine all available datasets and
train with a federated strategy to finetune the CLIP model, aligning the visual
representation with attributes, additionally, we investigate the efficacy of
leveraging freely available online image-caption pairs under weakly supervised
learning; (iii) in pursuit of efficiency, we train a Faster-RCNN type model
end-to-end with knowledge distillation, that performs class-agnostic object
proposals and classification on semantic categories and attributes with
classifiers generated from a text encoder; Finally, (iv) we conduct extensive
experiments on VAW, MS-COCO, LSA, and OVAD datasets, and show that recognition
of semantic category and attributes is complementary for visual scene
understanding, i.e., jointly training object detection and attributes
prediction largely outperform existing approaches that treat the two tasks
independently, demonstrating strong generalization ability to novel attributes
and categories
Continuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space
Despite its fruitful applications in remote sensing, image super-resolution
is troublesome to train and deploy as it handles different resolution
magnifications with separate models. Accordingly, we propose a
highly-applicable super-resolution framework called FunSR, which settles
different magnifications with a unified model by exploiting context interaction
within implicit function space. FunSR composes a functional representor, a
functional interactor, and a functional parser. Specifically, the representor
transforms the low-resolution image from Euclidean space to multi-scale
pixel-wise function maps; the interactor enables pixel-wise function expression
with global dependencies; and the parser, which is parameterized by the
interactor's output, converts the discrete coordinates with additional
attributes to RGB values. Extensive experimental results demonstrate that FunSR
reports state-of-the-art performance on both fixed-magnification and
continuous-magnification settings, meanwhile, it provides many friendly
applications thanks to its unified nature
A Novel Interpolation Fingerprint Localization Supported by Back Propagation Neural Network
In view of people's increasing demand for location-aware service, high-accuracy indoor localization has been considered the top priority of location-based service (LBS), therefore, the compact and cost-effective ZigBee technology with low power dissipation will undoubtedly be taken as one of the options for indoor localization within small area. As the accuracy cannot satisfy the application requirement, traditional localization ZigBee-based algorithm is abandoned gradually. This paper proposes a novel ZigBee-based indoor fingerprint localization algorithm and optimizes it through back propagation neural network (BPNN) interpolation method. Simulation result shows that this algorithm can significantly reduce the number of fingerprints and improve localization accuracy
A Trilaminar Data Fusion Localization Algorithm Supported by Sensor Network
In order to overcome some problems, such as its low accuracy and failure in evaluating its performance, this paper use the weighted trilaminar data fusion of LS-RSSI to improve the incipient localization estimate values by analyze and study the lease square (LS) and Received Signal Strength Indication (RSSI) algorithm. As a result, we obtain a trilaminar data fusion localization algorithm of LS-RSSI, which has a better optimized localization estimate value. This algorithm has the advantages of limited numbers of calculation and is able to reduce the localization errors. As shown in the simulation, we are able to get a much more accuracy and stable localization estimate value with the trilaminar data fusion technology
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
The flourishing blossom of deep learning has witnessed the rapid development
of text recognition in recent years. However, the existing text recognition
methods are mainly proposed for English texts. As another widely-spoken
language, Chinese text recognition (CTR) in all ways has extensive application
markets. Based on our observations, we attribute the scarce attention on CTR to
the lack of reasonable dataset construction standards, unified evaluation
protocols, and results of the existing baselines. To fill this gap, we manually
collect CTR datasets from publicly available competitions, projects, and
papers. According to application scenarios, we divide the collected datasets
into four categories including scene, web, document, and handwriting datasets.
Besides, we standardize the evaluation protocols in CTR. With unified
evaluation protocols, we evaluate a series of representative text recognition
methods on the collected datasets to provide baselines. The experimental
results indicate that the performance of baselines on CTR datasets is not as
good as that on English datasets due to the characteristics of Chinese texts
that are quite different from the Latin alphabet. Moreover, we observe that by
introducing radical-level supervision as an auxiliary task, the performance of
baselines can be further boosted. The code and datasets are made publicly
available at https://github.com/FudanVI/benchmarking-chinese-text-recognitionComment: Code is available at
https://github.com/FudanVI/benchmarking-chinese-text-recognitio
The role of 245 phase in alkaline iron selenide superconductors revealed by high pressure studies
Here we show that a pressure of about 8 GPa suppresses both the vacancy order
and the insulating phase, and a further increase of the pressure to about 18
GPa induces a second transition or crossover. No superconductivity has been
found in compressed insulating 245 phase. The metallic phase in the
intermediate pressure range has a distinct behavior in the transport property,
which is also observed in the superconducting sample. We interpret this
intermediate metal as an orbital selective Mott phase (OSMP). Our results
suggest that the OSMP provides the physical pathway connecting the insulating
and superconducting phases of these iron selenide materials.Comment: 32 pages, 4 figure
- …